-
Notifications
You must be signed in to change notification settings - Fork 590
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
feat: use opendal as the s3 sdk by default #18011
Conversation
The following longevity test passed with similar resource usage, source throughput and no compaction lag using the same nightly image before and after switching to opendal: |
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the effort in testing!
BTW, Should we make your testing workload as standard release testing, and do such tests every time bumping OpenDAL version and releasing RisingWave? Can we configure some automatic jobs to complete one-click testing?
All the testing I have done is from the regular testing pipeline:
Can we trigger perf/longevity test in PR? (not an urgent request) cc @huangjw806 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM and thanks for your great contribution.
I have some questions about this PR and would like to have some short descriptions so that we can easily trace the PR in the future!
- when we officially switch to opendal, some behaviours will change, such as whether opendal retries streaming requests, e.g. streaming_read / streaming_upload (I'm not entirely sure about this detail, if there is a change, please add a short description)
- after switching to opendal, the number of retries provided by the configuration may change from the previous one
- For example, when we use aws-sdk, by default we use the 2 retries provided by sdk, which is tricky, but after switching to opendal, the number of retries will be reduced.
- opendal not support
retry_unknown_service_error
, We have to find out if opendal's retry implementation contains all the error types/error codes we need.
1bb9fa3
to
49be510
Compare
Both streaming_read / streaming_upload retries are implemented for opendal object store so there is no change in this part.
The retry attempts do change after this PR for S3. I document the behavior change in the PR description.
OpenDAL doesn't expose the exact error codes so there is no way we can check error codes in our codes. However, OpenDAL will use the @Li0k PTAL. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Rest LGTM, Thanks for the test with all the contributions .
@@ -567,6 +567,7 @@ impl<OS: ObjectStore> MonitoredObjectStore<OS> { | |||
pub async fn upload(&self, path: &str, obj: Bytes) -> ObjectResult<()> { | |||
let operation_type = OperationType::Upload; | |||
let operation_type_str = operation_type.as_str(); | |||
let media_type = self.media_type(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nits: How about using the same variable name? media_type and engine_type are too similar.
Co-authored-by: Zhanxiang (Patrick) Huang <[email protected]>
I hereby agree to the terms of the RisingWave Labs, Inc. Contributor License Agreement.
What's changed and what's your intention?
Non-s3 object store backend has been using opendal for a while. This PR switches the s3 sdk to opendal by default as well.
related: #14321
There is a behavior change for s3 retry:
Prior to this PR:
After this PR:
The reason why we have this behavior change is because we didn't disable the internal retry when using aws-sdk-s3, which unexpectedly add more retries than specified in RW config. Given that all object stores other than s3 already honored the retry attempts specified in config
[storage.object_store.retry]
, I think it is okay to correct the unexpected behavior in S3 after switching to opendal as well and let it fully honor our config.Checklist
./risedev check
(or alias,./risedev c
)Documentation
Release note
The sdk that connects to S3 object store state backend is switched from aws-sdk-s3 to opendal. This is an internal 3rd party dependency change, intended to unify the sdk RisingWave uses to connect to different object store backend and reduce burden on maintenance. This change should be seamingless but if you experience instability or unexpected errors when running RisingWave on S3, please file an issue in our GitHub repo.
We will keep aws-sdk-s3 as a fallback option before we fully deprecate it in 1-2 releases. You can use the following config to switch back to aws-sdk-s3: